96 research outputs found

    Lattice calculations in heavy hadron physics

    Get PDF

    Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins

    Get PDF
    A method to detect DNA-binding sites on the surface of a protein structure is important for functional annotation. This work describes the analysis of residue patches on the surface of DNA-binding proteins and the development of a method of predicting DNA-binding sites using a single feature of these surface patches. Surface patches and the DNA-binding sites were initially analysed for accessibility, electrostatic potential, residue propensity, hydrophobicity and residue conservation. From this, it was observed that the DNA-binding sites were, in general, amongst the top 10% of patches with the largest positive electrostatic scores. This knowledge led to the development of a prediction method in which patches of surface residues were selected such that they excluded residues with negative electrostatic scores. This method was used to make predictions for a data set of 56 non-homologous DNA-binding proteins. Correct predictions made for 68% of the data set

    Teaching data science and cloud computing in low and middle income countries

    Get PDF
    Large, publicly available data sets present a challenge and an opportunity for researchers based in Low and Middle Income Countries (LMIC). The challenge for these researchers is how they can make use of such data sets given their poor connectivity and infrastructure. The opportunity is the ability to perform leading edge research using these data sets and hence avoid having to invest substantial resources in generating the data sets. The offshoot of this will be to generate solutions to the substantial local problems encountered in these countries and create an educated workforce in data science. Cloud computing in particular may well close the infrastructural gap here. In this paper we discuss our experiences of teaching a variety of summer schools on data intensive analysis in bioinformatics in China, Namibia and Malaysia. On the basis of these experiences we propose that a larger series of summer schools in data science and cloud computing in LMIC would create a cadre of data scientists to start this process. We finally discuss the possibility of the provision of cloud computing resources where the usage costs are controlled so that it is affordable for LMIC researchers

    The application of Hadoop in structural bioinformatics

    Get PDF
    The paper reviews the use of the Hadoop platform in structural bioinformatics applications. For structural bioinformatics, Hadoop provides a new framework to analyse large fractions of the Protein Data Bank that is key for high-throughput studies of, for example, protein-ligand docking, clustering of protein-ligand complexes and structural alignment. Specifically we review in the literature a number of implementations using Hadoop of high-throughput analyses and their scalability. We find that these deployments for the most part use known executables called from MapReduce rather than rewriting the algorithms. The scalability exhibits a variable behaviour in comparison with other batch schedulers, particularly as direct comparisons on the same platform are generally not available. Direct comparisons of Hadoop with batch schedulers are absent in the literature but we note there is some evidence that Message Passing Interface implementations scale better than Hadoop. A significant barrier to the use of the Hadoop ecosystem is the difficulty of the interface and configuration of a resource to use Hadoop. This will improve over time as interfaces to Hadoop, e.g. Spark improve, usage of cloud platforms (e.g. Azure and Amazon Web Services (AWS)) increases and standardised approaches such as Workflow Languages (i.e. Workflow Definition Language, Common Workflow Language and Nextflow) are taken up

    The CODATA-RDA Data Steward School

    Get PDF
    Given the expected increase in demand for Data Stewards and Data Stewardship skills it is clear that there is a need to develop training, education and CPD (continuous professional development) in this area. In this paper a brief introduction is provided to the origin of definitions of Data Stewardship. Also it notes the present tendency towards equivalence between Data Stewardship skills and FAIR principles. It then focuses on one specific training event – the pilot Data Stewardship strand of the CODATA-RDA Research Data Science schools that by the time of the IDCC meeting will have been held in Trieste in August 2019. The paper will discuss the overall curriculum for the pilot school, how it matches with the FAIR4S framework, and plans for getting feedback from the students. Finally, the paper discuss future plans for the school, in particular how to deepen the integration between the Data Stewardship strand with the Early Career Researcher strand. [This paper is a conference pre-print presented at IDCC 2020 after lightweight peer review.
    corecore